-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stream NVD data via Jackson to reduce memory footprint #6275
Conversation
Great work @philippn ! Sounds very promising. Hope we can get this merged :) Thank you! |
This isn't going to have the desired effect. The streaming changes would need to be made in the upstream library: https://github.com/jeremylong/Open-Vulnerability-Project/blob/c2fb94510a6632748436a9058a75ffa35d04e0b0/open-vulnerability-clients/src/main/java/io/github/jeremylong/openvulnerability/client/nvd/NvdCveClient.java#L325-L333 |
My previous PR, #6270, will drastically reduce memory usage. |
This change appears to build upon the previous changes in #6270 more than anything. No idea whether it's a significant enough improvement to warrant the extra complexity unless it was compared with the unreleased version of the code before this change, however conceptually it still seems like it could have advantages to try and process entries one-by-one as they are read from the files produced by #6270 rather than them being bulk read from files into a big collection and then the collection separately iterated through. But depends how large those collections/files are, and how much garbage is produced in the process. |
@jeremylong Thanks for your feedback, I appreciate it. This change aims primarily at improving the situation when using DependencyCheck with the As for #6196 , your fix definitely improves the situation (again, thanks for that). However, due to the way that the JSON is processed, it still loads the whole document into the memory right now. That is roughly 150 megs per year at the moment multiplied by the amount of concurrent threads (which reproducibly produces OOMs with limited memory). If you believe this to be better suited for the nvd lib, I would be happy to supply a patch there too or you are free to use the code wherever/however you want. Many thanks for your work and kind regards, |
This PR is somehow causing a regression with the maven integration tests. |
The test failures were caused by a bug in the initialization of the streaming, i.e. the first cve entry was always skipped. That should be fixed now. |
Thanks for the PR. I'll likely also look at the number of threads being spun up. |
Hi, please say if you may want a new issue for this rather than this follow-up here. I just noticed the update in the changelog specifically the JSON streaming, and had a look for interest's sake. Although the latest version has already got some improvements, I see here a hidden memory leak option in case of broken input. Previously, the I/O handling (managing streams, file handling) itself was handled in I would rather go with making a more universal constructor i.e. |
@knalli Your suggestion is reasonable, I would indeed recommend to open a new issue though (and/or perhaps a pull request). I could also have a go at it, during the holidays if necessary 🙂 |
-> #6326 |
Fixes Issue
This is a follow-up of #6196. One of the suggestions made in that thread was to switch to using JSON streaming using Jackson. I have implemented this in this Pull Request.
The results are quite impressive, the initial download and initialization now succeeds with Xmx512m, which was previously not possible.
Description of Change
Introduced new
CveItemSource
abstraction and using it inNvdApiProcessor
instead of reading from the JSON directly. The implementations ofCveItemSource
take care of the JSON streaming internally using Jackson Streaming API.Have test cases been added to cover the new functionality?
no, as there is not really any new functionalitity. But can provide one if necessary.